C中编码问题

7月 17, 2017 in C++学习

C中编码问题

编码介绍

在代码中我们通常不可避免的出现一些中文，这个时候我们就要考虑到中文的编码格式，如果不注意可能会导致乱码或者信息失真等问题。我们常用的中文编码有GBK,gb2312,Unicode等等。具体详细的介绍看下面几篇文章：

C语言编码转换

在C语言中，如果需要讲编码进行转换，可以使用iconv系列函数。
头文件以及常用函数:

#include <iconv.h>
typedef void* iconv_t;
extern iconv_t iconv_open(const char* to_code, const char* from_code);
extern size_t iconv(iconv_t cd, char** restrict inbuf, size_t* in_left_buf, char** restrict outbuf, size_t* out_left_buf);
extern int iconv_close(iconv_t cd);

iconv_open

函数说明

此函数说明将要进行哪两种编码的转换，并返回一个转化句柄。

参数说明

tocode:目标编码
fromcode : 原编码

iconv

1	extern size_t iconv(iconv_t cd, char** restrict inbuf, size_t* in_left_buf, char** restrict outbuf, size_t* out_left_buf);

函数说明

此函数用于从inbuf中读取数据并将转换到指定编码的的数据输出到outbuf中，若转换成功，则输出本次转化的字节数，否则返回sizeof_t(-1)

参数说明

cd : 转换描述符,由iconv_open获得
inbuf:输入缓冲区
in_left_buf :输入缓冲区还未转换的字符数
outbuf : 输出缓冲区
out_len_buf:输出缓冲区的剩余空间.

iconv_close

1	extern int iconv_close(iconv_t cd);

用于关闭iconv_open打开的文件描述符

举例转换函数

#include <iostream>
#include <string>
#include <iconv.h>
#include <cstring>
#include <errno.h>
using namespace std;
string convertCode(const string& p_str, const char* from, const char* to) {
  char * sin, * sout;
  int lenin, lenout, ret;
  const int BUF_LEN = 10240;
  char bufOut[BUF_LEN];
  string result("");
  memset(bufOut, 0x0, sizeof(bufOut));
  iconv_t cd;
  if ((cd = iconv_open(to, from)) == (iconv_t)(-1)) {
    std::cout << "open iconv error" << std::endl;
    return "";
  }
  lenin = p_str.length();
  lenout = BUF_LEN;
  sin = (char*)p_str.c_str();
  sout = bufOut;
//  std::cout << sin << std::endl;
  //std::cout << lenin << std::endl;
  //std::cout << lenout << std::endl;
  ret = iconv(cd, &sin, static_cast<size_t * >(&lenin), &sout, static_cast<size_t * >(&lenout));
  //errno:84:Invalid or incomplate multibyte or wide character
  if (-1 == ret) {
    std::cout << strerror(errno) << std::endl;
    if (errno != 84) {
      return "";
    }
  }
  std::cout << "bufout:" << bufOut << std::endl;
  std::cout << "bufout end" << std::endl;
  iconv_close(cd);
  result.assign(bufOut, BUF_LEN - lenout);
  return result;
}
int main() {
  string s = "哈哈";
  std::cout << s.length() << std::endl;
  s = convertCode(s, "gbk", "utf-8//IGNORE");
  //std::cout << s << std::endl;
  std::cout << s.length() << std::endl;
}

iconv函数出现段错误的原因

使用iconv函数进行转换的时候可能会出现段错误，这里出现这个错误的主要原因是注意看iconv函数的函数原型:

1	extern size_t iconv(iconv_t cd, char** restrict inbuf, size_t* in_left_buf, char** restrict outbuf, size_t* out_left_buf);

长度为size_t的指针，int指针转换为size_t指针在一些系统的转换过程会出现问题，导致长度出现错误，内存越界，出现段错误。错误信息如下：

Program received signal SIGSEGV, Segmentation fault.
from_gbk (irreversible=0x7fffffffb188, outend=0x61d7c0 "", outptrp=<synthetic pointer>,
    inend=0xa7ffffffdb76 <error: Cannot access memory at address 0xa7ffffffdb76>,
    inptrp=0x7fffffffb2e8, step_data=0x6157d0, step=0x615030) at ../iconv/loop.c:325
325	../iconv/loop.c: No such file or directory.

size_t与int类型

size_t类型是在stddef.h文件中定义。size_t的类型与操作系统相关，在32位架构中被普遍定义为:

1	typedef unsigned int size_t;

在64为机器中被定义为:

1	typedef unsigned long size_t;

int类型在32和64为机器上的长度都是4位，long在32位机器为4位，在64位机器为8位。所以在64为机器上,size_t和int指针转换的过程中一定会出现问题，在32为系统中的正整数指针不会指针，但是负整数也会出现问题。

Comment and share

C++中find_if查找vector中的特定struct以及值域查询

7月 15, 2017 in C++ problems

C++中find_if查找vector中的特定struct以及值域查询

由于struct不是C++中的内置类型所以不能用std::find直接查找，而且find函数不能满足值域的查询。这个时候需要使用find_if来进行查询。

find_if函数

find_if是一个模板函数，函数原型及函数定义：

template <class InputIterator, class Predicate>
InputIterator find_if(InputIterator first, InputIterator end, Predicate pred) {
  while (first != last && !pred(*first)) ++first;
  return first;
}

函数参数

first : 起始迭代器
end : 结束迭代器
pred : 用于比较数值的函数或者函数对象(仿函数)。遍历条件即为pred()为真.

函数返回值

若有满足pred条件的元素，返回该元素的迭代器，否则返回end.

函数说明

该函数最重要的环节是pred，它的核心环节是重载()运算符，因为每个容器迭代器的*运算符得到的结果都是该容器的value_type的值，所以改重载函数的参数类型是value_type的引用类型。

find_if函数应用

在struct的vector中查找特定的对象.特别注意的是：仿函数的参数类型是值的const引用，但是finder的构造参数是实际要比较的值的类型，在使用过程中,向构造函数中传的值是要比较的值。

#include <algorithm>
#include <vector>
#include <iostream>
using namespace std;
struct book{
	int m_iID;
	string m_strName;
	book(int t_id, string t_name):m_iID(t_id), m_strName(t_name) {}
};
struct book_finder{
	int m_iID;
	book_finder(int t_id):m_iID(t_id) {}
	bool operator() (const book& t) {return t.m_iID == m_iID;}
};
int main() {
	vector<book> bookVc;
	book book1(0, "书0");
	book book2(1, "书1");
	book book3(2, "书2");
	book book4(3, "书3");
	bookVc.push_back(book1);
	bookVc.push_back(book2);
	bookVc.push_back(book3);
	bookVc.push_back(book4);
	book target(1, "书");
	if (std::find_if(bookVc.begin(), bookVc.end(), book_finder(target.m_iID)) != bookVc.end()) {
		cout << "存在1" << std::endl;
	} else {
		cout << "不存在1" << std::endl;
	}
	target.m_iID = 10;
	if (std::find_if(bookVc.begin(), bookVc.end(), book_finder(target.m_iID)) != bookVc.end()) {
		cout << "存在10" << std::endl;
	} else {
		cout << "不存在10" << std::endl;
	}
}

Comment and share

魏传柳

C中编码问题

C中编码问题

编码介绍

C语言编码转换

iconv_open

函数说明

参数说明

iconv

函数说明

参数说明

iconv_close

举例转换函数

iconv函数出现段错误的原因

size_t与int类型

C++中find_if查找vector中的特定struct以及值域查询

C++中find_if查找vector中的特定struct以及值域查询

find_if函数

函数参数

函数返回值

函数说明

find_if函数应用

魏传柳(2824759538@qq.com)

author.bio

Tencent

ShenZhen,China